The Application of Text Mining Software to Examine Coded Information
نویسندگان
چکیده
The purpose of this paper is to examine the use of text mining to investigate coded information. Inventory codes are usually nominal rather than ordinal. However, codes are not completely devoid of meaning; generally items in the inventory that are similar are assigned base numbers with added digits to distinguish between individual items. The standard approach to using such data in statistical models that require at least ordinal data is to create dummy variables, ie the presence or absence of an item. Instead, it is possible to treat the code as a text word, and to use text analysis to examine the data. In this paper, two examples from the healthcare industry are examined. The first examines ICD-9 codes that are used in Medicare billing to list patient risk factors and complications. The second examines medications routinely prescribed to patients in open heart surgery. Introduction Coded information is problematic using traditional statistical methods. A large retail inventory can contain hundreds and thousands of different codes. For example, the Office Depot web site contains 13 major categories of office supplies. The pencils and pens category contains another 5 subcategories with 11 further refined categories just of pens. Rollerball pens for one manufacturer have
منابع مشابه
ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کاملUser’s Interaction with Information through eFront Learning Management System
Background and Aim: In order to comprehension of interactive content and content production standards, and also users interaction with LMSs, and their behavior in dealing with information, the aim of this paper is to examine the users interaction information provided in the eFront application, an open source Learning Management System, by emphasizing SCORM standard. Method: The method that used...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کامل